mat.1 <- matrix(1:12, nrow=4)
mat.1 [,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12
Source of 95% problems
Julien Martin
University of Ottawa
April 4, 2024
recognise different types of data in R ✔️
understand some different data structures ✔️
learn how to import data into R ✔️
learn how to manipulate data in R ✔️
learn how to export data from R ✔️
six types of data in R
numeric - 1.618, 3.14, 2.718
integers - 1, 2, 3, 42, 101
logical - TRUE or FALSE
character - “BI5009”, “Blue”
complex
raw
five data structures
vector
matrix
array
data frame
list
one dimensional collection elements
can contain all data types
all elements must be of the same type
most commonly used data structure for statistical data analysis
powerful 2-dimensional vector holding structure
dataframes can hold vectors of any of the basic classes of data
treat nitrogen block height weight leafarea shootarea flowers
1 tip medium 1 7.5 7.62 11.7 31.9 1
2 tip medium 1 10.7 12.14 14.1 46.0 10
3 tip medium 1 11.2 12.76 7.1 66.7 10
4 tip medium 1 10.4 8.78 11.9 20.3 1
5 tip medium 1 10.4 13.58 14.5 26.9 4
6 tip medium 1 9.8 10.08 12.2 72.7 9
7 tip medium 1 6.9 10.11 13.2 43.1 7
8 tip medium 1 9.4 10.28 14.0 28.5 6
9 tip medium 2 10.4 10.48 10.5 57.8 5
10 tip medium 2 12.3 13.48 16.1 36.9 8
simplest method is to use spreadsheet
and then import data into R
save as a comma separated file (*.csv)
missing data represented with NA or empty cells. Do not mixed both
No spaces in names (use underscore if needed)
no uppercase
keep variable names short & informative
read.table() function is the workhorse with format specified function to make coding faster
| petunia | name of the new object |
| read.table() | function use tot read tha data |
| ‘data/flowers.csv’ | relative path name to the data file |
| header = TRUE | first line contain columns names |
| sep = ‘,’ | columns sparated by commas |
sep = "\t"petunia <- read.csv('flowers.csv') # if comma-separated
petunia <- read.delim('flowers.txt') # if tab-separatedfunctions in the foreign package allows you to import files of other formats (i.e. from SAS, SPSS, Minitab etc)
use the readxl or readODS package to directly import MS Excel or LO calc spreadsheets directly
to view the contents of a data frame, type it’s name
rarely a good idea as just fills up your console
use the head() and tail() to see first/last 6 lines
[1] flowers
treat nitrogen block height weight leafarea shootarea flowers
1 tip medium 1 7.5 7.62 11.7 31.9 1
2 tip medium 1 10.7 12.14 14.1 46.0 10
3 tip medium 1 11.2 12.76 7.1 66.7 10
4 tip medium 1 10.4 8.78 11.9 20.3 1
5 tip medium 1 10.4 13.58 14.5 26.9 4
6 tip medium 1 9.8 10.08 12.2 72.7 9
str() function: data frame dimensions, list of variables, type of variables'data.frame': 96 obs. of 8 variables:
$ treat : chr "tip" "tip" "tip" "tip" ...
$ nitrogen : chr " medium" " medium" " medium" " medium" ...
$ block : int 1 1 1 1 1 1 1 1 2 2 ...
$ height : num 7.5 10.7 11.2 10.4 10.4 9.8 6.9 9.4 10.4 12.3 ...
$ weight : num 7.62 12.14 12.76 8.78 13.58 ...
$ leafarea : num 11.7 14.1 7.1 11.9 14.5 12.2 13.2 14 10.5 16.1 ...
$ shootarea: num 31.9 46 66.7 20.3 26.9 72.7 43.1 28.5 57.8 36.9 ...
$ flowers : int 1 10 10 1 4 9 7 6 5 8 ...
names() function: vector of variable names$ notation [1] 7.5 10.7 11.2 10.4 10.4 9.8 6.9 9.4 10.4 12.3 10.4 11.0 7.1 6.0 9.0
[16] 4.5 12.6 10.0 10.0 8.5 14.1 10.1 8.5 6.5 11.5 7.7 6.4 8.8 9.2 6.2
[31] 6.3 17.2 8.0 8.0 6.4 7.6 9.7 12.3 9.1 8.9 7.4 3.1 7.9 8.8 8.5
[46] 5.6 11.5 5.8 5.6 5.3 7.5 4.1 3.5 8.5 4.9 2.5 5.4 3.9 5.8 4.5
[61] 8.0 1.8 2.2 3.9 8.5 8.5 6.4 1.2 2.6 10.9 7.2 2.1 4.7 5.0 6.5
[76] 2.6 6.0 9.3 4.6 5.2 3.9 2.3 5.2 2.2 4.5 1.8 3.0 3.7 2.4 5.7
[91] 3.7 3.2 3.9 3.3 5.5 4.4
you can extract elements using the [rowIndex, columnIndex] method
Index can either be a positional index or a logical index
provide the row and column position of the data you wish to extract
index can either be a positional index or a logical index
1flowers[c(3, 8, 20), c(1, 4, 5, 6)] treat height weight leafarea
3 tip 11.2 12.76 7.1
8 tip 9.4 10.28 14.0
20 tip 8.5 14.33 13.2
can assign these extracted values to another object if you want
new object inherits data.frame class
we can use a short cut if we want all rows or all columns extracted
omitting the column index is shorthand for ‘all columns’
treat nitrogen block height weight leafarea shootarea flowers
1 tip medium 1 7.5 7.62 11.7 31.9 1
2 tip medium 1 10.7 12.14 14.1 46.0 10
3 tip medium 1 11.2 12.76 7.1 66.7 10
columnIndex treat nitrogen leafarea shootarea
1 tip medium 11.7 31.9
2 tip medium 14.1 46.0
3 tip medium 7.1 66.7
4 tip medium 11.9 20.3
5 tip medium 14.5 26.9
6 tip medium 12.2 72.7
7 tip medium 13.2 43.1
8 tip medium 14.0 28.5
9 tip medium 10.5 57.8
10 tip medium 16.1 36.9
we can also extract rows based on a logical test
example, all rows where the height variable is greater than 12
treat nitrogen block height weight leafarea shootarea flowers
10 tip medium 2 12.3 13.48 16.1 36.9 8
17 tip high 1 12.6 18.66 18.6 54.0 9
21 tip high 1 14.1 19.12 13.1 113.2 13
32 tip high 2 17.2 19.20 10.9 89.9 14
38 tip low 1 12.3 11.27 13.7 28.7 5
leafarea is equal to 8.7we can combine logical tests using the & symbol (AND) or the | symbol (OR)
example, extract all rows where height is > 10.5 and nitrogen is equal to "medium"
height is greater than 12.3 OR less than 1.8write.csv() function exports data frames to an external file
saves flowers data frame to a file named ‘flowers.csv’
row.names = FALSE argument supresses the row names in the file
there are many other options for importing and exporting data in R
the fread() and fwrite() functions in the read.table package are blazingly fast
the read_csv() and write_csv() (and other related) functions from the readr package for tidyverse alternatives
if you have a lot of data (I mean alot!) then take a look at the ff and bigmemory packages
Credit: I borrowed slides from Alex Douglas